Ackowledgements

  • Kathryn B. Newhart | Department of Civil and Environmental Engineering | Colorado School of Mines
  • Tzahi Y. Cath | Department of Civil and Environmental Engineering | Colorado School of Mines
  • Amanda S. Hering | Department of Statistical Science | Baylor University

Overview of Topics

  • Introduction & Motivating Example
  • Multi-State AD-PCA
  • Simulation Study
  • Case Study Results
  • Software Package for R
  • Summary and References

Introduction & Motivating Example

Introduction

  • Many factories or other closed production systems use real-time online process monitoring.
  • We aim to detect potential problems within the system before they cause damage or cause the process to shut down.
  • Data description:
    • non-Gaussian process
    • multiple correlated variables
    • temporal dependence of observations
    • non-stationary due to operator input an ambient conditions

Process Dimensionality

  • In multivariate contexts, the number of parameters to estimate increases quadratically with dimension.
  • That is, we must estimate \(p + \frac{p}{2}(p + 1) \in \mathbb{O}(p^2)\) first- and second-moment parameters if we have \(p\) features.
  • Linear dimension reduction (LDR) allows us to simplify complex data structures to more simple data structures via linear combinations of features.
  • LDR is built around eigen-decomposition / principal component analysis (PCA) or the singular value decomposition (SVD).

Facility and Process Details

  • This work is motivated by our partnership with the Colorado School of Mines on the ReNUWit water preservation grant.
  • Our team manages a decentralised wastewater treatment (WWT) plant in Golden, CO.
  • We measure 40 features and their lagged values (80 total features).
  • The continuous features are aggregated to the minute-level.
  • We aim to quickly and accurately detect process malfunctions before human operators by incorporating known system state behavior.
  • Stock PCA applied to WWT by Wise et al. (1988) and chemical process monitoring by Kresta et al. (1991)
  • Baggiani and Marsili-Libelli (2009) used adaptive PCA, Sanchez-Fernandez et al. (2015) used distributive PCA, and Kazor et al. (2016) used adaptive-dynamic PCA to monitor WWT processes.

The Bioreactor System

Multi-State Process

Example Feature by Blower

Multi-State AD-PCA

PCA Basics

  • \(\textbf{X} := [\textbf{X}_1\ \vdots\ \textbf{X}_2\ \vdots\ \cdots\ \vdots\ \textbf{X}_n]^T \in \mathbb{R}_{n\times p}\), where \(\textbf{X}_s := \left(X_1(s), X_2(s), \ldots, X_p(s)\right)\) is a realization from process \(\mathcal{P}\) on index \(s = 1, \ldots, n\).
  • \(\textbf{X}^T\textbf{X} := \textbf{P} \times \boldsymbol\Lambda \times \textbf{P}^T\) is the eigendecomposition of the scatter matrix.
  • \(\textbf{P}^T\) is the \(p \times p\) orthogonal matrix of eigenvectors.
  • \(\boldsymbol\Lambda\) is the diagonal matrix of the \(p\) eigenvalues.
  • \(\textbf{P}_d\) is the \(p\times d\) projection matrix preserving the total data variation (energy) corresponding to \[ \left[\sum\limits_{i = 1}^d \lambda_i\right] / \text{tr}\left(\boldsymbol\Lambda\right). \]
  • \(\textbf{Y}_s := \textbf{X}_s\textbf{P}_d\) is the reduced-dimension representation of observation \(\textbf{X}_s\).

Competing Methods

PCA is non-optimal because of the autocorrelation and non-linearity / non-stationarity of the data. Some alternatives are:

  • Adaptive-Dynamic PCA (AD-PCA) of Kazor et al. (2016)
  • Kernel PCA (kPCA) of Ge et al. (2009)
  • Adaptive kPCA (AkPCA) of Chouaib et al. (2013)
  • Local Linear Embedding (LLE) Miao et al. (2013)
  • Multi-dimensional scaling and isometric mapping (IsoMap) of Tenenbaum et al. (2000)
  • Semidefinite Embedding (SDE) / Maximimum Variance Unfolding (MVU) of Weinberger and Saul (2006)

We combine Adaptive, Dynamic, and Multi-State modifications to PCA.

Adaptive PCA

  • Assume \(\mathcal{P}\) is locally linear within a window \(w + n_u\).
  • Create a \(w\)-width rolling training window preceding index \(s\) by defining \[ \textbf{X}_{w} := [\textbf{X}_{s - w + 1}\ \vdots\ \textbf{X}_{s - w + 2}\ \vdots\ \cdots\ \vdots\ \textbf{X}_{s - 1}\ \vdots\ \textbf{X}_s]^T. \]
  • Calculate the scatter matrix with \(\textbf{X}_{w}\) instead of the full data matrix \(\textbf{X}\).
  • Estimate \(\textbf{P}_d\).
  • After performing necessary monitoring statistics on the new observations, ''learn'' the newest \(n_u\) observations, ''forget'' the oldest \(n_u\) observations, and estimate the new scatter matrix.

Dynamic PCA

  • Because the features are correlated over time, we include up to \(\ell\) lags per feature.
  • The observation vector at index \(s\) is now \[ \begin{align} \text{L}(\textbf{X}_s) := &[X_1(s), X_1\left( s - 1 \right), \ldots, X_1\left( s - \ell \right),\ X_2(s), X_2\left( s - 1 \right), \ldots, X_2\left( s - \ell \right), \\ &\qquad \cdots,\ X_p(s), X_p\left( s - 1 \right), \ldots, X_p\left( s - \ell \right)]. \end{align} \]
  • These \(n - \ell\) rows form the lagged data matrix \[ \text{L}(\textbf{X}) := [\text{L}(\textbf{X}_{\ell})\ \vdots\ \text{L}(\textbf{X}_{\ell + 1})\ \vdots\ \cdots\ \vdots\ \text{L}(\textbf{X}_n)]^T \in\mathbb{R}_{(n - \ell) \times p(\ell + 1)}. \]
  • Calculate the scatter matrix with \(\text{L}(\textbf{X})\) instead of the data matrix \(\textbf{X}\).
  • Estimate the \(p(\ell + 1) \times d\) projection matrix \(\textbf{P}_d\).

Multi-State PCA

  • Let \(\mathcal{S}_k,\ k = 1, \ldots, K\) be a set of disjoint process states.
  • For the index \(s\), the observation \(\textbf{X}_{s}\) is said to belong in state \(\mathcal{S}_k\) if and only if \[ \textbf{X}_{s} \sim f_k\left( \boldsymbol\mu_k, \boldsymbol\Sigma_k \right). \]
  • \(f_k\) is some distribution with location parameter \(\boldsymbol\mu_k \in \mathbb{R}_{p\times 1}\) and \(\boldsymbol\Sigma_k \in \mathbb{R}^{\ge}_{p\times p}\)
  • Because the distribution changes from state to state, the principal components are also a function of the state.
  • Partition \(\textbf{X}\) as \(\left\{\textbf{X}^{(1)}, \textbf{X}^{(2)}, \ldots, \textbf{X}^{(K)}\right\}\), where \(\textbf{X}^{(k)}\) is the ordered set of observations under state \(\mathcal{S}_k\).
  • Estimate \(K\) distinct projection matrices \(\left\{\textbf{P}_d^{(1)}, \textbf{P}_d^{(2)}, \ldots, \textbf{P}_d^{(K)}\right\}\) from the partitioned \(\textbf{X}\).

Monitoring Statistics

These statistics are estimated non-parametrically, as discussed by Kazor et al. (2016)

Hotelling's \(T^2\)

  • The \(T^2\) statistic is the Mahalanobis distance of the mapped value \(\textbf{Y}_{s + 1}\) from the original space into the PCA-subspace.
  • The \(T^2\) statistic is \(T^2_{s + 1} = \textbf{Y}_{s + 1}\boldsymbol\Lambda_d^{-1} \textbf{Y}_{s + 1}^T\), for \(\boldsymbol\Lambda_d := \text{diag}\left( \lambda_1, \lambda_2, \ldots, \lambda_d \right)\).

Squared Prediction Error

  • The SPE statistic measures the goodness-of-fit of the \(d\)-dimensional model to the \(p\)-dimensional process observations, or how well \(\textbf{P}_d\) approximates \(\textbf{P}\).
  • The SPE statistic is \(\text{SPE}(\textbf{Y}_{s + 1}) = \left( \textbf{X}_{s + 1} - \textbf{Y}_{s + 1}\textbf{P}^T_d \right) \left( \textbf{X}_{s + 1} - \textbf{Y}_{s + 1}\textbf{P}^T_d \right)^T.\)

Simulation of NOC Observations

Details of Data Generation

We follow Kazor et al. (2016) and continue updating the original design of Dong and McAvoy (1996). Thus

  • the errors of \(t\), \(\epsilon_t\), be drawn from a first-order autoregressive process over \(s\), where \(\varepsilon \sim N(0, \sigma_{\varepsilon})\),
  • The autocorrelation parameter \(\varphi = 0.75\)
  • The process dimension \(p = 3\),
  • The cycle length is \(\omega = 7 * 24 * 60 = 10,080\), corresponding to one observation each minute over one week,
  • The observation index \(s = 1, \ldots, \omega\), and
  • The observation at index \(s\) is given by \(\textbf{X}_s := \left\langle x_1 = X_1(t_s), x_2 = X_2(t_s), \ldots, x_p = X_p(t_s) \right\rangle\).

Errors and Latent Feature Construction

  1. Draw the first innovation from \[ \varepsilon_1 \sim \mathcal{N}\left(\frac{1}{2}(a + b)(1 - \phi), \frac{b - a}{12}(1 - \phi ^ 2)\right), \] where \(a = 0.01\) and \(b = 2\). These mean and variance multipliers are from the mean and variance of the uniform\((a,b)\) distribution.
  2. Define the AR(1) error structure as \(\epsilon_s := \varphi \epsilon_{s - 1} + (1 - \varphi) \varepsilon,\quad s = 2, 3, \ldots\).
  3. Draw the latent feature \(t\) as \[ t_s := -\cos\left(\frac{2\pi}{\omega}s\right) + \varepsilon_s,\quad s = 1, \ldots, \omega. \]
  4. Scale \(t_s\) into \([a, b]\).

Feature and State Definitions

  1. Sample three machine-error vectors of length \(\omega\), \(\textbf{e}_1, \textbf{e}_2, \textbf{e}_3\overset{i.i.d.}{\sim}\mathcal{N}(0, 0.01)\).
  2. Define the three feature vectors to be \[ \begin{align*} x(t_s) &:= t_s + \textbf{e}_1(s) \\ y(t_s) &:= t_s^2 - 3t_s + \textbf{e}_2(s) \\ z(t_s) &:= -t_s^3 + 3t_s^2 + \textbf{e}_3(s). \end{align*} \]
  3. Change states hourly in sequence, where the three process states are

    • \(\mathcal{S}_1\): \(\textbf{X}(t_s) := \langle x(t_s), y(t_s), z(t_s) \rangle \cdot \textbf{I}\)
    • \(\mathcal{S}_2\): \(\textbf{X}(t_s) := \langle x(t_s), y(t_s), z(t_s) \rangle \cdot \textbf{P}_2 \boldsymbol\Lambda_2\)
    • \(\mathcal{S}_3\): \(\textbf{X}(t_s) := \langle x(t_s), y(t_s), z(t_s) \rangle \cdot \textbf{P}_3 \boldsymbol\Lambda_3\)

Process Projections

The process-specific scaling and projection matrices are: \[ \begin{gather} \textbf{P}_2 = \begin{bmatrix} 0 & 0.50 & -0.87 \\ 0 & 0.87 & 0.50 \\ 1 & 0 & 0 \end{bmatrix} & & \boldsymbol\Lambda_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0.5 & 0 \\ 0 & 0 & 2 \end{bmatrix} \\ & & \\ \textbf{P}_3 = \begin{bmatrix} 0 & 0.87 & -0.50 \\ -1 & 0 & 0 \\ 0 & 0.50 & 0.87 \end{bmatrix} & & \boldsymbol\Lambda_3 = \begin{bmatrix} 0.25 & 0 & 0 \\ 0 & 0.1 & 0 \\ 0 & 0 & 0.75 \end{bmatrix} \end{gather} \] \(\textbf{P}_2\) and \(\textbf{P}_3\) rotate the features so that the states are at right angles in at least one dimension. \(\boldsymbol\Lambda_2\) and \(\boldsymbol\Lambda_3\) inflate or deflate the feature variances along each principal component.

Projections Visualized

Monte Carlo Replication

We set the threshold to \(\alpha = 0.001\) and repeat the following 1000 times:

  1. Draw a random set of observations: one set of NOC under the single-state model (\(\mathcal{S}_1\) only) and one under the multi-state model.
  2. Apply each of the six single-state and nine multi-state faults (discussed in the next section) to both NOC data sets. There will be 17 total versions of the data set.
  3. Train the fault detection system with 75% of the observations:
    • 7,560 observations for the single-state model,
    • 2,520 observations per state for the multi-state model.
  4. Calculate AD- and MSAD-PCA projections for all 17 data sets.

Performance Metrics

An observation is flagged as suspicious if the monitoring statistic is beyond the calculated non-parametric threshold. An observation will trigger an alarm if it is the \(k^{th}\) observation in a row to be flagged. For this simulation study, we triggered an alarm at the third consecutive flag.

For each statistic under AD- and MSAD-PCA, measure

  • The false alarm rate from the NOC data sets. This is the number of false alarms recorded in the process observations without faults.
  • The fault detection indicator for each of the 15 fault data sets.
  • The time to detection; the earliest a statistic can trigger an alarm is after 3 minutes.

Fault Induction

Fault Overview

Induce one of the following faults at \(s = 8,500\).

Feature Affected Shift Fault Drift Fault Latent or Error Fault
All Features Equally Fault 1A Fault 2A Fault 3A
Feature(s) \((x, y, z)\) \((x, y, z)\) \((x, y, z)\)
Each Feature Differently Fault 1B Fault 2B Fault 3B
Feature(s) \((x)\) \((y, z)\) \((z)\)
Each State Differently Fault 1C Fault 2C Fault 3C
Feature(s) in State \((x, z)\in S_3\) \(y\in S_2\) \(y\in S_2\)

Normal Process

A single draw from the multivariate process under NOC. The vertical black line (21:40 on 2 December) marks time at which the fault would be induced.

Fault 1B

\(x^*(t_s) = x^*(t_s) + 2,\ s \ge 8500.\) This shift in feature \(X\) will infect features \(Y\) and \(Z\) through \(\textbf{P}_2\boldsymbol\Lambda_2\) and \(\textbf{P}_3\boldsymbol\Lambda_3\).

Fault 2C

\(y^*(t_s) = y(t_s) - 1.5 \times \frac{s - 8500}{10080 - 8500}\), for \(s > 8500\) and \(y\in\mathcal{S}_2.\) This fault is induced after state projections.

Fault 3A

\(\textbf{X}^*(t_s) = \textbf{X}(t_s^*)\), where \(t_s^* = \left[\frac{5(s - 8500)}{\omega - 8500} + 1\right]t_s\) for \(s > 8500.\)

Simulation Study Results

False Alarm Rates and Detection Proportions

Detection Times

Comparison

Under the Multi-State model:

  • The AD-PCA SPE monitoring statistic is not sensitive enough to detect faults reliably.
  • The AD-PCA \(T^2\) monitoring statistic detects faults reliably, but has far too many false alarms.
  • The MSAD-PCA SPE monitoring statistic offers an excellent combination of sensitivity and high detection probability.
  • The MSAD-PCA \(T^2\) monitoring statistic offers an excellent combination of specificity and high detection probability.

Under the Single-State model, the pairs of AD-PCA and MSAD-PCA process monitoring statistics perform similarly, so we do not expect significant loss of power when using the MSAD-PCA procedure to detect faults.

Case Study Results

False Alarm Rates

We increased the flags to trigger an alarm from 3 to 5 for the real case due to strong serial autocorrelation. Note that these false alarm rates are all greater than or equal to the set threshold level of \(\alpha = 0.1\%\).

MSAD \(T^2\) MSAD SPE AD \(T^2\) AD SPE
False Alarm Rate 0.1% 0.3% 0.3% 0.3%

Fault Detection Times

Software Package for R

Package mvMonitoring

  • No website: our package is currently in private beta-testing at the facility.
  • mspProcessData() generates random draws from a serially autocorrelated and nonstationary multi-state (or single-state) multivariate process.
  • mspTrain() trains the projection matrix and non-parametric fault detection thresholds.
  • mspMonitor() assesses incoming process observations and classifies them as normal or abnormal.
  • mspWarning() keeps a running tally of abnormal observations and raises an alarm if necessary.
  • Supports piping: mspProcessData() %>% mspTrain %>% mspMonitor %>% mspWarning
  • Alarms can be sent via email or SMS from the remote facility to the operators.

Summary and References

Summary

Overview

  • Explained the motivation for a multi-state modification to PCA.
  • Presented the MSAD-PCA dimension reduction procedure.
  • Described the simulation study used to validate the effectiveness of MSAD-PCA.
  • Discussed the simulation study and real data results.

MSAD-PCA Findings

  • Helps detect faults faster and more consistently in multi-state processes
  • Doesn't suffer loss of power when applied to single-state processes
  • Moves another step toward fault attribution

Future Work


Fault Detection


Fault Attribution


Process Automation

References

Baggiani, F., Marsili-Libelli, S., 2009. Real-time fault detection and isolation in biological wastewater treatment plants. Water Sci. Technol. 60, 2949–2961. doi:10.2166/wst.2009.723

Chouaib, C., Mohamed-Faouzi, H., Messaoud, D., 2013. Adaptive kernel principal component analysis for nonlinear dynamic process monitoring, in: Control Conference (ASCC), 2013 9th Asian. pp. 1–6. doi:10.1109/ASCC.2013.6606291

Dong, D., McAvoy, T.J., 1996. Batch tracking via nonlinear principal component analysis. AIChE J. 42, 2199–2208. doi:10.1002/aic.690420810

Ge, Z., Yang, C., Song, Z., 2009. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem. Eng. Sci. 64, 2245–2255. doi:10.1016/j.ces.2009.01.050

Kazor, K., Holloway, R.W., Cath, T.Y., Hering, A.S., 2016. Comparison of linear and nonlinear dimension reduction techniques for automated process monitoring of a decentralized wastewater treatment facility. Stoch. Environ. Res. Risk Assess. 30, 1527–1544. doi:10.1007/s00477-016-1246-2

Kresta, J.V., MacGregor, J.F., Marlin, T.E., 1991. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 69, 35–47. doi:10.1002/cjce.5450690105

Miao, A., Song, Z., Ge, Z., Zhou, L., Wen, Q., 2013. Nonlinear fault detection based on locally linear embedding. J. Control Theory Appl. 11, 615–622. doi:10.1007/s11768-013-2102-2

Sanchez-Fernandez, A., Fuente, M.J., Sainz-Palmero, G.I., 2015. Fault detection in wastewater treatment plants using distributed PCA methods, in: 2015 IEEE 20th Conference on Emerging Technologies Factory Automation (ETFA). pp. 1–7. doi:10.1109/ETFA.2015.7301504

Tenenbaum, J.B., Silva, V. de, Langford, J.C., 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323. doi:10.1126/science.290.5500.2319

Weinberger, K.Q., Saul, L.K., 2006. Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision 70, 77–90. doi:10.1007/s11263-005-4939-z

Wise, B.M., Veltkamp, D.J., Davis, B., Ricker, N.L., Kowalski, B.R., 1988. Principal components analysis for monitoring the West Valley Liquid-Fed Ceramic Melter, in: Management of Radioactive Wastes, and Non-Radioactive Wastes from Nuclear Facilities. Tucson, AZ.